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Abstract 

Built  on  a  hierarchical  access  structure  with 
primary  and  secondary  users,  opportunistic  spec¬ 
trum  access  improves  spectrum  efficiency  while 
maintaining  compatibility  with  legacy  wireless 
systems.  The  basic  idea  is  to  allow  secondary 
users  to  exploit  instantaneous  spectrum  availabil¬ 
ity  while  limiting  the  interference  to  primary 
users.  In  this  article,  we  identify  basic  compo¬ 
nents,  fundamental  trade-offs,  and  practical  con¬ 
straints  in  opportunistic  spectrum  access.  We 
introduce  a  decision-theoretic  framework  based 
on  the  theory  of  partially  observable  Markov 
decision  processes.  This  framework  allows  us  to 
systematically  tackle  the  optimal  integrated 
design  and  quantitatively  characterize  the  inter¬ 
action  between  signal  processing  for  opportunity 
identification  and  networking  for  opportunity 
exploitation.  A  discussion  of  open  problems, 
potential  applications,  and  recent  developments 
is  also  provided. 

Introduction 

Measurements  of  actual  spectrum  usage  have 
revealed  the  pervasiveness  of  idle  frequency 
bands  in  the  seemingly  crowded  radio  spectrum 
[1],  Due  to  bursty  arrivals  of  wireless  applica¬ 
tions  and  guard  bands  in  space,  much  of  the 
prized  spectrum  lies  unused  at  any  given  time 
and  location.  Shown  in  Fig.  1  is  a  wireless  LAN 
traffic  measurement,  indicating  75  percent  idle 
time  during  an  active  FTP  session  [2].  For  voice- 
over-IP  applications  such  as  Skype,  up  to  90  per¬ 
cent  idle  time  has  been  observed. 

These  measurements  highlight  the  drawbacks 
of  the  current  static  spectrum  allotment  policy. 
There  has  been  an  exciting  flurry  of  activities  in 
engineering,  economics,  and  regulation  commu¬ 
nities  in  searching  for  dynamic  spectrum  access 
strategies  for  improved  spectrum  efficiency.  Var¬ 
ious  approaches  have  been  proposed  and  stud¬ 
ied.  A  taxonomy  of  dynamic  spectrum  access 
strategies  can  be  found  in  [3]. 

In  this  article  we  focus  on  the  overlay 
approach  under  the  hierarchical  access  model  of 
dynamic  spectrum  access  [3].  Spectrum  overlay 
was  first  envisioned  by  Mitola  [4]  and  then  inves¬ 
tigated  in  the  Defense  Advanced  Research  Pro¬ 


jects  Agency  (DARPA)  Next  Generation  (XG) 
program  as  opportunistic  spectrum  access  (OSA). 
The  idea  is  to  exploit  instantaneous  spectrum 
availability  by  opening  licensed  spectrum  to  sec¬ 
ondary  users.  It  directly  targets  spatial  and  tem¬ 
poral  spectrum  white  space  by  allowing 
secondary  users  to  identify  and  exploit  local  and 
instantaneous  spectrum  availability  in  a  nonin- 
trusive  manner.  Even  in  unlicensed  bands,  OSA 
may  be  of  considerable  value  for  spectrum  effi¬ 
ciency  (e.g.,  by  adopting  a  hierarchical  pricing 
structure  to  support  both  subscribers  and  oppor¬ 
tunistic  users). 

To  realize  these  potentials,  many  complex 
issues  in  technical,  economical,  as  well  as  regula¬ 
tory  aspects  need  to  be  addressed.  In  this  article 
we  focus  on  technical  aspects  of  OSA.  We  iden¬ 
tify  basic  components,  fundamental  trade-offs, 
and  practical  constraints,  and  discuss  open  prob¬ 
lems  and  recent  advances.  Based  on  the  theory 
of  partially  observable  Markov  decision  process¬ 
es  (POMDPs),  we  develop  a  decision-theoretic 
framework  that  leads  to  an  optimal  joint  design 
of  OSA,  and  a  systematic  examination  of  the 
interaction  between  signal  processing  for  oppor¬ 
tunity  identification  and  networking  for  opportu¬ 
nity  exploitation. 

Technical  Challenges  and 
Design  Trade-offs 

While  conceptually  simple,  OSA  presents  chal¬ 
lenges  not  present  in  conventional  wired  or  wire¬ 
less  networks.  To  protect  spectrum  licensees 
from  interference  while  providing  sufficient  ben¬ 
efit  to  secondary  users,  OSA  must  rely  on 
advanced  signal  processing  techniques  for  instan¬ 
taneous  opportunity  identification  and  sophisti¬ 
cated  networking  protocols  for  nonintrusive 
opportunity  exploitation.  The  tension  between 
the  secondary  users’  desire  for  performance  and 
the  primary  users’  need  for  protection  dictates 
the  interaction  between  opportunity  identifica¬ 
tion  and  opportunity  exploitation,  and  the  opti¬ 
mal  design  of  OSA  calls  for  a  cross-layer 
approach  that  integrates  signal  processing  with 
networking. 

Basic  design  components  of  OSA  include  a 
spectrum  sensor  at  the  physical  layer  for  oppor- 
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tunity  identification,  a  sensing  policy  at  the 
medium  access  control  (MAC)  layer  for  real¬ 
time  decisions  about  which  channels  in  the  spec¬ 
trum  to  sense,  and  an  access  policy,  also  at  the 
MAC  layer,  to  determine  whether  to  access 
based  on  the  sensing  outcome.  These  three  com¬ 
ponents  should  be  jointly  designed  to  maximize 
the  throughput  of  secondary  users  while  limiting 
the  interference  to  primary  users. 

Spectrum  Sensor:  False  Alarm  vs. 

Miss  Detection 

The  spectrum  sensor  of  a  secondary  user  identi¬ 
fies  spectrum  opportunities  by  detecting  the 
presence  of  primary  signals  (i.e.,  by  performing 
a  binary  hypothesis  test).  Sensing  errors  are 
inevitable:  false  alarms  occur  when  idle  chan¬ 
nels  are  detected  as  busy,  and  miss  detections 
occur  when  busy  channels  are  detected  as  idle. 
In  the  event  of  a  false  alarm,  a  spectrum  oppor¬ 
tunity  is  overlooked  by  the  sensor,  and  eventual¬ 
ly  wasted  if  the  access  policy  trusts  the  sensing 
outcome.  On  the  other  hand,  miss  detections 
may  lead  to  collisions  with  primary  users.  While 
both  types  of  sensing  errors  are  undesirable, 
reducing  the  occurrence  of  one  generally  comes 
at  the  price  of  increasing  the  occurrence  of  the 
other.  Consider,  for  example,  an  energy  detec¬ 
tor.  Choosing  a  larger  energy  detection  thresh¬ 
old  reduces  the  probability  of  a  false  alarm  but 
increases  the  probability  of  miss  detection.  The 
trade-off  between  false  alarm  and  miss  detec¬ 
tion  is  thus  an  important  issue  and  should  be 
addressed  by  considering  the  impact  of  sensing 
errors  on  the  MAC  layer  performance  in  terms 
of  throughput  and  collision  probability.  On  a 
more  fundamental  level,  which  criterion  should 
be  adopted  in  the  design  of  the  spectrum  sen¬ 
sor,  Bayes  or  Neyman-Pearson  (NP)?  If  the  for¬ 
mer,  how  do  we  choose  the  risks?  If  the  latter, 
how  should  we  set  the  constraint  on  the  proba¬ 
bility  of  false  alarm? 

Sensing  Policy:  Gaining  Immediate  Access  vs. 
Gaining  Information  for  Future  Use 

Due  to  hardware  limitations  and  the  energy  cost 
of  spectrum  monitoring,  a  secondary  user  may 
not  be  able  to  sense  all  the  channels  in  the  spec¬ 
trum  simultaneously.  A  sensing  policy  is  thus 
necessary  for  intelligent  channel  selection  to 
track  the  rapidly  varying  spectrum  opportuni¬ 
ties.  The  purpose  of  the  sensing  policy  is 
twofold:  catch  a  spectrum  opportunity  for  imme¬ 
diate  access,  and  obtain  statistical  information 
on  spectrum  occupancy  for  better  opportunity 
tracking  in  the  future.  A  balance  has  to  be 
reached  between  these  two  often  conflicting 
objectives,  and  the  trade-off  should  adapt  to  the 
bursty  traffic  and  energy  constraint  of  the  sec¬ 
ondary  user.  When  the  user  has  no  data  to 
transmit,  is  it  worthwhile  to  continue  spending 
energy  on  spectrum  monitoring?  If  so,  how 
should  the  sensing  policy  change  given  that 
immediate  spectrum  access  is  no  longer  neces¬ 
sary?  Clearly,  such  decisions  should  be  made  by 
taking  into  account  the  accuracy  and  energy 
consumption  characteristics  of  the  spectrum 
sensor. 


■  Figure  1  .A  wireless  LAN  traffic  measurement  during  an  active  FTP  session. 


Access  Policy:  Aggressive  vs.  Conservative 

Based  on  the  imperfect  sensing  outcomes  given 
by  the  spectrum  sensor,  the  secondary  user 
needs  to  decide  whether  to  access.  The  objective 
of  the  access  policy  is  to  minimize  the  chance  of 
overlooking  an  opportunity  without  violating  the 
constraint  of  being  nonintrusive.  Whether  the 
secondary  user  should  adopt  an  aggressive  or  a 
conservative  access  policy  depends  on  the  oper¬ 
ating  characteristics  (probability  of  false  alarm 
vs.  probability  of  miss  detection)  of  the  spectrum 
sensor,  and  joint  design  of  them  is  necessary  for 
optimality. 

The  above  discussion  provides  a  glimpse  into 
the  design  complexity  of  OSA  in  a  dynamic  net¬ 
work  environment  with  fading,  random  traffic, 
energy  constraints,  and  competing  distributed 
users.  Is  the  optimal  joint  design  tractable?  Even 
if  we  arrive  at  an  optimal  solution,  will  it  be  too 
complicated  to  implement  and  too  sensitive  to 
environmental  changes  to  be  useful? 

A  Decision-Theoretic  Framework 
Based  on  POMDP 

As  an  initial  attempt  to  address  the  technical 
challenges  outlined  above,  we  introduce  a  deci¬ 
sion-theoretic  framework.  Based  on  the  theory 
of  POMDP,  this  framework  integrates  the  three 
basic  components  of  OSA,  leading  to  an  optimal 
joint  design  of  signal  processing  algorithms  for 
opportunity  identification  and  networking  proto¬ 
cols  for  opportunity  exploitation. 

POMDP  often  suffers  from  the  curse  of 
dimensionality.  The  constraint  on  interference  to 
primary  users  further  complicates  the  problem. 
We  have  shown  that,  surprisingly,  the  structure 
of  OSA  admits  a  separation  principle  that  decou¬ 
ples  the  design  of  the  sensing  policy  from  that  of 
the  spectrum  sensor  and  access  policy.  This  sep¬ 
aration  principle  reveals  the  optimality  of  the 
myopic  approach  to  design  of  the  spectrum  sen¬ 
sor  and  access  policy,  leading  to  closed-form 
optimal  solutions.  Furthermore,  the  design  of 
the  sensing  policy  is  reduced  to  an  unconstrained 
POMDP,  where  optimality  can  be  achieved  with 
deterministic  policies.  These  results  suggest  a 
favorable  trade-off  between  optimality  and  com¬ 
plexity  of  the  OSA  design. 
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■  Figure  2.  Illustration  of  the  set  of  all  feasible  sensor  operating  points  (the 
operating  point  (e,  8)  can  be  achieved  by  randomizing  between  the  optimal 
NP  detectors  designed  under  the  constraints  that  the  false  alami  probability  is 
no  larger  than  eft*  and  eft*,  respectively). 


Network  Model 

Consider  a  spectrum  consisting  of  At  channels, 
each  with  bandwidth  Bt  ( i  =  1,  •  ■  N).  These  N 

channels  are  allocated  to  a  network  of  primary 
users  who  communicate  according  to  a  syn¬ 
chronous  slot  structure.  The  traffic  statistics  of 
the  primary  network  are  such  that  the  occupancy 
of  these  N  channels  follows  a  discrete-time 
Markov  process  with  2N  states,  where  the  state  is 
defined  as  the  availability  (idle  or  busy)  of  each 
channel. 

We  consider  a  group  of  secondary  users  seek¬ 
ing  spectrum  opportunities  in  these  N  channels. 
We  focus  on  an  ad  hoc  network  where  secondary 
users  sense  and  access  the  spectrum  indepen¬ 
dently.  In  each  slot,  a  secondary  user  chooses  a 
set  of  channels  to  sense  and  a  set  of  channels  to 
access  based  on  the  sensing  outcome.  Limited  by 
its  hardware  constraints  and  energy  supply,  a 
secondary  user  can  sense  no  more  than  L\  (Lj  < 
N)  and  access  no  more  than  L2  (L2  <  L i)  chan¬ 
nels  in  each  slot.  To  simplify  notations  and  illus¬ 
trate  the  basic  idea,  we  consider  L\  =  L2  =  1. 
The  decision-theoretic  framework  presented  in 
this  article,  however,  applies  to  the  general  case. 


achievable  probability  of  miss  detection  §min(e) 
=  1  -  -PD,max(£))  can  be  attained  by  the  optimal 
NP  detector  with  the  constraint  that  the  false 
alarm  probability  is  no  larger  than  e,  or  by  an 
optimal  Bayesian  detector  with  a  suitable  set  of 
risks  [5,  Sec.  2.2.1].  As  illustrated  in  Fig.  2,  the 
best  ROC  curve  Pd.  max  forms  the  upper  bound¬ 
ary  of  the  feasible  set  of  operating  points.  We 
also  note  that  every  feasible  operating  point  (e, 
5)  lies  on  a  line  that  connects  two  boundary 
points  and  hence  can  be  achieved  by  randomiz¬ 
ing  between  two  optimal  NP  detectors  with 
properly  chosen  constraints  on  the  probability 
of  false  alarm  [5,  Sec.  2.2.2].  Therefore,  the 
design  of  the  spectrum  sensor  is  reduced  to  the 
choice  of  the  desired  sensor  operating  point.  In 
other  words,  our  objective  is  to  find,  sequen¬ 
tially  in  each  slot,  the  optimal  sensor  operating 
point  (e*,  5*)  in  the  feasible  set  to  achieve  the 
best  trade-off  between  false  alarm  and  miss 
detection.  Note  that  the  optimal  operating 
point  may  vary  from  slot  to  slot. 

Sensing  and  Access  Policies  —  The  sensing  policy 
decides,  sequentially  in  each  slot,  which  channel 
to  sense,  and  the  access  policy  determines 
whether  to  transmit  based  on  the  sensing  out¬ 
come.  When  the  secondary  user  accesses  an  idle 
channel,  a  reward  is  accrued  in  this  slot  (e.g.,  we 
can  define  reward  as  the  number  of  bits  deliv¬ 
ered).  On  the  other  hand,  a  collision  with  prima¬ 
ry  users  occurs  when  accessing  a  busy  channel. 

The  joint  design  of  OSA  is  to  choose  the 
sensing  and  access  policies  together  with  the 
sensor  operating  policy  that  specifies  the  operat¬ 
ing  point  (e,  8)  in  each  slot.  The  objective  is  to 
maximize  the  total  expected  reward  accumulated 
over  time  under  the  constraint  that  the  probabil¬ 
ity  of  colliding  with  primary  users  is  capped 
below  a  prescribed  level. 

Constrained  POMDP  Formulation 

Due  to  partial  spectrum  monitoring  and  sensing 
errors,  the  internal  state  of  the  underlying 
Markov  process  that  models  spectrum  occupancy 
cannot  be  fully  observed.  Considering  the  con¬ 
straint  on  the  collision  probability,  we  can  for¬ 
mulate  the  joint  design  of  OSA  as  a  constrained 
POMDP  as  detailed  below. 

Reward  and  Objective  Functions  —  A  natural  defini¬ 
tion  of  reward  is  the  number  of  bits  delivered. 
For  example,  the  reward  for  accessing  an  idle 
channel  a  can  be  defined  as  the  bandwidth  of 
channel  a. 


Basic  Design  Components 

As  noted  earlier,  OSA  has  three  basic  design 
components:  a  spectrum  sensor,  a  sensing  policy, 
and  an  access  policy. 

Spectrum  Sensor  —  By  performing  a  binary 
hypotheses  test,  the  spectrum  sensor  detects 
the  presence  of  primary  users  in  a  chosen  chan¬ 
nel.  Referred  to  as  the  receiver  operating  char¬ 
acteristic  (ROC),  the  probabilities  of  false 
alarm  and  miss  detection  (e,  5)  specify  the  per¬ 
formance  of  the  spectrum  sensor.  For  a  given 
e,  the  largest  achievable  probability  of  detec¬ 
tion  P d (or  equivalently,  the  smallest 


R=Ba. 

In  a  fading  environment,  the  reward  may  also 
depend  on  the  random  fading  gain  of  channel  a. 

We  can  define  the  objective  function  as  the 
expected  total  number  of  bits  transmitted  in  T 
slots: 


J  =  E 


Lf=l 


(1) 


Note  that  the  reward  R(t )  obtained  in  slot  t 
depends  on  the  sensing  action  (which  channel  to 
choose),  the  access  action  (whether  to  transmit), 
the  sensor  operating  point  (e(t),8(t)),  and  the 


16 


IEEE  Wireless  Communications  •  August  2007 


state  of  the  underlying  Markov  process  (channel 
availability)  in  slot  t. 

This  objective  function  is  particularly  appro¬ 
priate  when  the  underlying  Markovian  model 
only  holds  for  a  small  number  of  slots  due  to 
rapid  variations  in  spectrum  occupancy  statistics. 
When  the  spectrum  usage  of  primary  users  is 
relatively  static,  we  can  use  the  transmission  rate 
averaged  over  an  infinite  horizon  or  the  total 
discounted  bits  as  the  objective: 


J  =  lim  —  E 

T 

O 

II 

ffl 

£*?'*(*) 

r->°°r 

-7=1 

-7=1 

where  0  <  r|  <  1  is  the  discount  factor.  The  lat¬ 
ter  is  more  appropriate  for  delay-sensitive  mes¬ 
sages  where  transmissions  in  the  future  are  less 
rewarding. 


Constraint  and  Joint  Design  —  The  design  constraint 
is  on  the  interference  to  primary  users.  Let  C, 
denote  the  maximum  probability  of  collision 
allowed  in  any  channel  and  any  slot.  Using  the 
objective  function  defined  in  Eq.  l,we  can  for¬ 
mulate  the  joint  design  of  OSA  as  finding  the 
optimal  sensor  operating  policy  7tg,  the  optimal 
sensing  policy  n*,  and  the  optimal  access  policy 
nc  given  by 


{jtg,K*,K*c}  =  arg  max 

ns,ns,  kc 

subject  to  Pc  <  £, 


T 

.»= t 


(2) 


where  Pc  is  the  probability  of  collision  deter¬ 
mined  by  the  chosen  {7ig,  ns,  nc}. 


Sufficient  Statistic  —  The  key  to  choosing  the  opti¬ 
mal  actions  in  a  given  slot  is  the  knowledge  of 
the  current  state  of  the  underlying  Markov  pro¬ 
cess.  While  the  system  state  cannot  be  directly 
observed,  the  user  can  infer  it  from  its  decision 
and  observation  history.  As  shown  in  [6],  the  sta¬ 
tistical  information  about  the  system  state  pro¬ 
vided  by  the  entire  decision  and  observation 
history  can  be  encapsulated  in  a  belief  vector 
A (t)  =  [^i(t),  '  '  where  h(t)  is  the  con¬ 

ditional  probability  (given  the  decision  and 
observation  history)  that  the  system  state  is  j  at 
the  beginning  of  slot  t.  Smallwood  and  Sondik 
have  shown  that  this  belief  vector  is  a  sufficient 
statistic  [6],  Thus,  a  sensor  operating  policy  7ig 
defines  the  mapping  from  the  current  belief  vec¬ 
tor  A (t)  to  the  sensor  operating  point  (e(t),  8(t)) 
used  in  this  slot.  Similarly,  a  sensing  policy  ns 
maps  A(t)  to  the  index  of  the  channel  to  be 
sensed  in  this  slot,  and  an  access  policy  7tc  maps 
A (t)  and  the  sensing  outcome  to  the  access  deci¬ 
sion.  With  a  finite  horizon  T,  the  optimal  poli¬ 
cies  are  usually  nonstationary;  that  is,  the 
mapping  from  A(f)  to  actions  varies  with  time. 

For  a  constrained  POMDP  (as  we  have  here), 
we  often  need  to  resort  to  randomized  policies 
to  achieve  optimality.  In  this  case,  7ts  determines 
the  probability  of  choosing  each  channel,  nc  the 
transmission  probability,  and  jig  the  probability 
density  function  of  (e,  8).  Due  to  the  continuous 
action  space,  randomized  policies  are  computa¬ 
tionally  prohibitive  and  implementationally  cum¬ 
bersome.  Fortunately,  as  described  below,  the 


■  Figure  5.  An  illustration  of  the  interaction  between  the  PHY  layer  spectrum 
sensor  and  the  MAC  layer  access  strategy  (e:  probability  of  false  alarm,  8: 
probability  of  miss  detection,  maximum  allowable  collision  probability). 


structure  of  the  problem  admits  a  separation 
principle  that  leads  to  deterministic  policies 
without  sacrificing  optimality. 

Optimal  Joint  Design  and 
Separation  Principle 

We  have  established  a  separation  principle  for 
the  joint  design  of  OSA  that  provides  a  simple 
and  explicit  optimal  solution  to  a  seemingly 
intractable  problem  [7].  We  have  shown  that  the 
joint  design  can  be  carried  out  in  two  steps  with¬ 
out  losing  optimality: 

•  Obtain  the  optimal  sensor  operating  policy  jig 
and  the  optimal  access  policy  n*c  by  maximiz¬ 
ing  the  instantaneous  reward  R(t)  in  the  cur¬ 
rent  slot  under  the  collision  constraint. 

•  Obtain  the  optimal  sensing  policy  7tj  to  maxi¬ 
mize  the  objective  function  /  given  in  Eq.  1 
using  7tg  and  nc  obtained  in  the  first  step. 

The  separation  principle  decouples  the  design 
of  the  sensing  policy  from  that  of  the  spectrum 
sensor  and  access  policies.  As  a  consequence, 
the  design  of  the  sensing  policy  is  reduced  to  an 
unconstrained  POMDP,  where  optimality  is 
achieved  with  deterministic  policies.  Further¬ 
more,  it  reveals  that  the  optimal  sensor  operat¬ 
ing  policy  7ig  and  the  optimal  access  policy  nc 
can  be  obtained  from  a  myopic  approach  that 
focuses  solely  on  the  instantaneous  reward  and 
ignores  the  impact  of  the  current  actions  on  the 
future  reward.  The  joint  design  of  7tg  and  jc^.  is 
thus  reduced  to  a  static  optimization  problem 
with  a  simple,  time-invariant,  and  closed-form 
solution.  This  closed-form  optimal  design  of  jig 
and  7tc  also  allows  us  to  quantitatively  character¬ 
ize  the  interaction  between  the  physical  (PHY) 
layer  spectrum  sensor  and  the  MAC  layer  access 
strategy. 

Optimal  Spectrum  Sensor  and  Access  Policy  in  Closed- 
Form  —  As  illustrated  in  Fig.  3,  the  set  of  feasible 
sensor  operating  points  is  partitioned  into  two 
regions  by  the  maximum  allowable  collision 
probability  t,:  the  “conservative”  region  (8  >  Q 
and  the  “aggressive”  region  (8  <  Q.  When  the 
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sensor  operates  at  8  >  there  is  a  high  chance 
that  a  busy  channel  is  detected  as  idle.  This  sug¬ 
gests  that  the  access  policy  should  be  conserva¬ 
tive  to  ensure  that  the  collision  probability  is 
capped  below  Indeed,  as  shown  in  [7],  when 
the  channel  is  detected  as  busy,  the  user  should 
always  refrain  from  transmission;  even  when  the 
channel  is  detected  as  available,  it  should  only 
transmit  with  probability  CJ5  <  1. 

On  the  other  hand,  in  the  region  where  6  < 
£,  false  alarms  are  more  likely  to  happen.  To 
reduce  overlooked  opportunities,  the  user  should 
adopt  an  aggressive  access  policy:  when  the 
channel  is  detected  as  available,  always  transmit; 
even  when  the  channel  is  detected  as  busy,  still 
transmit  with  probability  (£  -  5)/(l  -  5)  >  0. 

When  the  sensor  operates  at  S  =  £,  the  opti¬ 
mal  access  policy  simply  trusts  the  sensor:  access 
if  and  only  if  the  channel  is  detected  as  avail¬ 
able.  In  other  words,  the  access  policy  does  not 
need  to  be  conservative  or  aggressive  to  balance 
the  occurrence  of  false  alarms  and  miss  detec¬ 
tions.  Note  that  at  this  point  the  access  policy 
becomes  deterministic.  Interestingly,  the  optimal 
joint  design  of  OSA  defined  in  Eq.  2  requires 
that  the  sensor  operate  at  this  transition  point  5* 
=  £  on  the  best  ROC  curve  Pd, max  in  each  slot, 
independent  of  the  belief  vector  [7],  As  a  conse¬ 
quence,  the  optimal  policies  7i§,  Tt*s,  n*c  are  all 
deterministic. 

The  separation  principle  allows  us  to  obtain, 
in  closed  form,  the  optimal  access  policy  for  any 
feasible  spectrum  sensor,  as  well  as  the  optimal 
joint  design.  Extensions  of  the  separation  princi¬ 
ple  to  the  multichannel  sensing  case  can  be 
found  in  [8]. 

Low-Complexity  Design  of  the  Sensing  Policy  —  We 

consider  now  the  optimal  sensing  policy.  This  is 
a  standard  unconstrained  POMDP  to  which 
solutions  can  be  found  in  [6].  Our  focus  here  is 
complexity  reduction  by  exploiting  the  underly¬ 
ing  structure  of  OSA. 

An  analysis  given  in  [9]  shows  that  the  com¬ 
putational  complexity  of  obtaining  the  optimal 
sensing  policy  is  OfN7),  which  grows  exponentially 
with  the  horizon  length  T.  The  complexity  main¬ 
ly  comes  from  the  dimension  2N  of  the  sufficient 
statistic  A,  the  foresighted  planning  for  maximiz¬ 
ing  the  overall  throughput,  and  the  continuously 
growing  observation  history.  To  achieve  a  favor¬ 
able  trade-off  between  optimality  and  complexi¬ 
ty,  we  explore  the  possibility  of  circumventing 
each  of  these  three  sources  of  high  complexity. 

It  has  been  shown  in  [10]  that  when  channels 
evolve  independently,  we  can  find  a  sufficient 
statistic  whose  dimension  grows  linearly  instead 
of  exponentially  with  the  number  N  of  channels. 
Specifically,  let  Q  =  [®i,  ■  •  •,  cojv],  where  co,  is 
the  (marginal)  conditional  probability  that  chan¬ 
nel  i  is  available  at  the  beginning  of  a  slot.  Q  is  a 
sufficient  statistic  if  the  channels  are  indepen¬ 
dent.  This  result  points  to  the  possibility  of  sig¬ 
nificantly  reducing  the  computation  and  storage 
complexity  of  the  optimal  sensing  policy. 

The  alternative  to  foresighted  planning  is  the 
myopic  approach  that  aims  solely  at  maximizing 
the  immediate  reward.  As  revealed  by  the  sepa¬ 
ration  principle,  a  myopic  approach  to  the  design 
of  the  spectrum  sensor  and  access  policy  leads  to 


the  optimal  solution.  A  myopic  sensing  policy, 
unfortunately,  is  generally  suboptimal.  An  inter¬ 
esting  finding  is  that  when  channels  evolve  as 
independent  and  identical  Markov  processes,  a 
myopic  approach  is  also  optimal  for  the  design 
of  the  sensing  policy;  we  no  longer  need  to  trade 
immediate  spectrum  access  for  spectrum  occu¬ 
pancy  information  [11].  Furthermore,  a  myopic 
sensing  policy  has  a  simple  structure;  selecting 
channels  in  each  slot  is  reduced  to  a  counting 
procedure.  The  secondary  user  only  needs  to  set 
up  pointers  indicating  the  channels  to  which  the 
last  visits  occurred  most  recently  or  the  longest 
time  ago  [11]. 

The  key  to  truncating  the  observation  history 
without  decimating  performance  is  to  exploit  the 
mixing  time  of  the  underlying  Markov  process 
[9].  The  mixing  time  quantifies  how  long  it  takes 
for  the  Markov  process  to  approach  its  station¬ 
ary  distribution.  When  the  Markovian  dynamics 
of  the  spectrum  occupancy  have  a  mixing  time  of 
M,  sensing  outcomes  obtained  more  than  M  slots 
ago  provide  little  information  on  the  current 
channel  state.  We  can  thus  truncate  the  observa¬ 
tion  history  to  M  slots,  and  the  sufficient  statistic 
Q  takes  only  a  small  number  of  values.  Thus,  the 
computational  complexity  of  the  optimal  sensing 
policy  is  reduced  from  D(NT)  to  0(NMT),  which 
is  linear,  rather  than  exponential,  in  the  horizon 
length  T.  More  important,  this  result  suggests  a 
systematic  way  of  trading  off  performance  with 
complexity  by  choosing  an  appropriate  trunca¬ 
tion  parameter  M. 

Open  Problems 

The  decision-theoretic  framework  presented 
here  captures  the  fundamental  design  trade-offs 
in  OSA:  false  alarms  vs.  miss  detections  of  the 
spectrum  sensor,  aggressiveness  vs.  conservative¬ 
ness  of  the  access  policy,  and  gaining  spectrum 
access  vs.  gaining  spectrum  information  in  the 
sensing  strategy.  Many  problems  in  both  funda¬ 
mental  theories  and  practical  implementations, 
however,  remain  open. 

Theoretical  Aspects 

We  have  assumed  that  the  transition  probabili¬ 
ties  of  the  underlying  Markov  process  that  mod¬ 
els  spectrum  occupancy  are  known  or  have  been 
learned  accurately.  Simulation  results  suggest 
that  OSA  design  under  this  POMDP  framework 
is  robust  to  model  mismatch  [7],  OSA  with  an 
unknown  Markov  model  is  an  interesting  yet 
nontrivial  problem.  With  an  unknown  model,  a 
secondary  user  learns  a  good  policy  by  compar¬ 
ing  the  observation  and  action  trajectories  under 
different  policies  and  correlating  rewards  with 
actions.  Formulations  and  algorithms  for 
POMDP  with  an  unknown  model  exist  in  the  lit¬ 
erature  [12].  They  provide  useful  tools  for  solv¬ 
ing  this  problem. 

The  results  on  the  low-complexity  design  of 
the  sensing  policy  apply  only  to  independent 
channels.  Generalizations  to  systems  consisting 
of  dependent  channels  remain  open.  Further¬ 
more,  the  robustness  of  the  optimal  design  to 
mismatched  and  time-varying  spectrum  occupan¬ 
cy  models  needs  in-depth  investigation.  Answers 
to  these  questions  will  establish  the  fundamental 


18 


IEEE  Wireless  Communications  •  August  2007 


trade-off  across  optimality,  complexity,  and 
robustness  of  this  framework. 

Energy  constraints  can  further  enrich  the 
problem.  The  cost  in  each  slot  consists  of  the 
energy  consumed  in  both  sensing  and  transmis¬ 
sion  over  fading  channels.  The  design  objective 
is  to  maximize  the  number  of  bits  transmitted 
during  the  battery  lifetime  of  a  user  subject  to  a 
constraint  on  the  probability  of  collision.  Under 
the  energy  constraint,  the  user  may  choose  not 
to  transmit  when  the  available  channel  suffers 
from  severe  fading,  leading  to  protocols  that  are 
opportunistic  in  both  time  and  spectrum.  The 
user  may  even  skip  sensing  when  the  current 
belief  vector  indicates  that  no  channel  is  likely  to 
be  available.  Preliminary  results  on  energy-con¬ 
strained  OSA  in  a  fading  environment  can  be 
found  in  [13]. 

Also  of  interest  are  cooperating  schemes 
where  secondary  users  sense  and  share  partial 
spectrum  maps  [14].  Challenges  here  include 
characterization  of  the  overhead  associated  with 
cooperation  and  the  design  of  optimal  policies. 

Protocol  Implementation  Aspects 

We  have  not  considered  protocol  implementa¬ 
tion  specifics  in  a  general  multihop  ad  hoc  net¬ 
work  with  competing  secondary  users.  In  a 
general  network  the  state  of  spectrum  occupancy 
can  be  location-dependent;  a  channel  available 
at  a  transmitter  may  not  be  available  at  the  cor¬ 
responding  receiver.  Furthermore,  the  ability  to 
deal  with  hidden  and  exposed  terminals  and  col¬ 
lisions  among  secondary  users  is  crucial  to  the 
efficiency  of  OSA.  Transceiver  synchronization  is 
also  an  important  issue.  In  the  presence  of  colli¬ 
sions  and  sensing  errors,  ensuring  that  a  sec¬ 
ondary  user  and  its  intended  receiver  hop 
synchronously  in  the  spectrum  with  minimal  con¬ 
trol  message  exchange  is  a  challenge  not  present 
in  conventional  MAC  design. 

An  initial  attempt  at  addressing  the  above 
issues  can  be  found  in  [10].  Many  questions, 
however,  remain  unanswered.  How  can  we  fur¬ 
ther  reduce  collisions  among  secondary  users 
caused  by  hidden  terminals  and  wasted  spectrum 
opportunities  caused  by  exposed  terminals?  Are 
classic  collision  avoidance  schemes  such  as  busy 
tone  and  dual  busy  tone  feasible  for  OSA  where 
we  may  not  have  a  dedicated  channel  for  the 
transmission  of  busy  tones?  What  is  the  optimal 
power  control  for  multihop  ad  hoc  OSA  net¬ 
works?  Since  power  control  determines  the  area 
within  which  primary  users  may  be  affected  by  a 
particular  secondary  user,  how  do  we  choose  the 
transmission  power  of  secondary  users  based  on 
that  of  primary  users?  How  do  the  maximum 
allowable  collision  probability,  channel  fading, 
and  sensing  errors  affect  power  control?  Existing 
techniques  for  conventional  ad  hoc  networks 
may  inspire  new  ideas  to  address  these  unique 
challenges  in  OSA. 

Concluding  Remarks 

In  this  article  we  have  outlined  some  of  the  tech¬ 
nical  challenges  of  OSA  and  made  an  initial 
attempt  at  establishing  a  theoretical  framework 
within  which  these  challenges  can  be  systemati¬ 
cally  and  collectively  addressed.  We  conclude 


this  article  with  a  brief  overview  of  strategic 
applications  envisioned  for  OSA,  and  exciting 
research  activities  in  the  communications  and 
networking  communities.  The  former  sketches 
some  of  the  many  promises  of  OSA,  the  latter 
our  engineers’  answers  to  whether  these  promis¬ 
es  will  be  fulfilled. 

Potential  Applications 

Both  commercial  and  military  applications  of 
OSA  have  been  envisioned.  Consider,  for  exam¬ 
ple,  sensor  networks  deployed  for  carbon  monox¬ 
ide  or  traffic  monitoring  in  metropolitan  areas, 
opportunistic  WiFi  users  at  airports,  or  military 
units  penetrating  deep  in  tounknown  territory. 

OSA  presents  an  attractive  approach  to  rapid 
deployment  crucial  to  applications  for  disaster 
relief  and  emergency  response.  As  an  example, 
consider  a  disaster  relief  scenario  where  multiple 
rescue  teams  from  different  agencies  and  states 
may  come  together.  The  composition  of  such 
teams  is  likely  to  dynamically  change  through 
the  course  of  the  rescue  effort.  A  related  exam¬ 
ple  is  that  of  a  multination  coalition  force  that 
may  be  involved  in  full-scale  military  operations, 
peacekeeping,  and  humanitarian  relief  opera¬ 
tions  in  spatially  contiguous  areas.  Such  a  force 
will  probably  rely  on  multiple  sensor  networks, 
some  of  which  may  be  deployed  as  needed,  to 
provide  actionable  intelligence.  When  the  tempo 
of  operations  is  high,  it  would  be  difficult  and, 
even  if  possible,  wasteful,  to  pre-allocate  spec¬ 
trum  resources  to  the  various  actors  and  agents. 

Tactical  wireless  networks  are  closed-loop 
systems  with  delay,  partial  models,  and  inaccu¬ 
rate  knowledge  of  various  parameters.  As  a  con¬ 
sequence,  they  fall  naturally  under  the  purview 
of  partially  observable  Markov  decision  process¬ 
es  we  have  discussed.  Elements  of  the  network 
must  sense,  decide,  and  actuate.  For  such  a  com¬ 
plex  combat  system  with  heavy  traffic,  large 
scale,  and  heterogenous  wireless  devices,  OSA 
may  be  the  key  to  integrated  sensing,  communi¬ 
cation,  and  actuation. 

Related  Work 

In  this  article  we  have  mainly  focused  on  the 
exploitation  of  temporal  spectrum  opportunities 
resulting  from  the  bursty  traffic  of  primary 
users.  There  is  also  a  growing  body  of  literature 
focusing  on  spatial  spectrum  opportunities  that 
are  static  or  slowly  varying  in  time.  Example 
applications  include  the  reuse  of  certain  TV 
bands  that  are  not  used  for  TV  broadcast  in  a 
particular  region.  Due  to  the  slow  temporal 
variation  of  spectrum  occupancy,  opportunity 
identification  is  not  as  critical  a  component  in 
this  class  of  applications,  and  existing  work 
along  this  line  often  assumes  perfect  knowledge 
of  spectrum  opportunities  in  the  whole  spec¬ 
trum  at  any  location. 

At  the  physical  layer,  opportunity  identifica¬ 
tion  in  the  presence  of  fading  and  noise  uncer¬ 
tainties  has  been  studied  [15].  Cognitive  radio, 
the  physical  platform  of  OSA,  has  also  received 
increasing  attention  recently.  Spectrum  monitor¬ 
ing  testbeds  [2]  and  cognitive  radio  prototypes 
[16]  are  being  developed  by  researchers  from 
both  academia  and  industry.  They  validate  the 
feasibility  and  practical  value  of  theoretical 
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research  and  provide  empirical  data  for  spec¬ 
trum  occupancy  modeling. 

This  list  is  by  no  means  complete.  For  an 
overview  of  recent  developments  in  OSA,  read¬ 
ers  are  referred  to  [3],  and  to  proceedings  of 
workshops  and  conferences  such  as  DySpan  and 
CrownCom. 
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